Goto

Collaborating Authors

 genomic prediction


DPCformer: An Interpretable Deep Learning Model for Genomic Prediction in Crops

arXiv.org Artificial Intelligence

Genomic Selection (GS) uses whole-genome information to predict crop phenotypes and accelerate breeding. Traditional GS methods, however, struggle with prediction accuracy for complex traits and large datasets. We propose DPCformer, a deep learning model integrating convolutional neural networks with a self-attention mechanism to model complex genotype-phenotype relationships. We applied DPCformer to 13 traits across five crops (maize, cotton, tomato, rice, chickpea). Our approach uses an 8-dimensional one-hot encoding for SNP data, ordered by chromosome, and employs the PMF algorithm for feature selection. Evaluations show DPCformer outperforms existing methods. In maize datasets, accuracy for traits like days to tasseling and plant height improved by up to 2.92%. For cotton, accuracy gains for fiber traits reached 8.37%. On small-sample tomato data, the Pearson Correlation Coefficient for a key trait increased by up to 57.35%. In chickpea, the yield correlation was boosted by 16.62%. DPCformer demonstrates superior accuracy, robustness in small-sample scenarios, and enhanced interpretability, providing a powerful tool for precision breeding and addressing global food security challenges.


Biology-informed neural networks learn nonlinear representations from omics data to improve genomic prediction and interpretability

arXiv.org Artificial Intelligence

We extend biologically-informed neural networks (BINNs) for genomic prediction (GP) and selection (GS) in crops by integrating thousands of single-nucleotide polymorphisms (SNPs) with multi-omics measurements and prior biological knowledge. Traditional genotype-to-phenotype (G2P) models depend heavily on direct mappings that achieve only modest accuracy, forcing breeders to conduct large, costly field trials to maintain or marginally improve genetic gain. Models that incorporate intermediate molecular phenotypes such as gene expression can achieve higher predictive fit, but they remain impractical for GS since such data are unavailable at deployment or design time. BINNs overcome this limitation by encoding pathway-level inductive biases and leveraging multi-omics data only during training, while using genotype data alone during inference. Applied to maize gene-expression and multi-environment field-trial data, BINN improves rank-correlation accuracy by up to 56% within and across subpopulations under sparse-data conditions and nonlinearly identifies genes that GWAS/TWAS fail to uncover. With complete domain knowledge for a synthetic metabolomics benchmark, BINN reduces prediction error by 75% relative to conventional neural nets and correctly identifies the most important nonlinear pathway. Importantly, both cases show highly sensitive BINN latent variables correlate with the experimental quantities they represent, despite not being trained on them. This suggests BINNs learn biologically-relevant representations, nonlinear or linear, from genotype to phenotype. Together, BINNs establish a framework that leverages intermediate domain information to improve genomic prediction accuracy and reveal nonlinear biological relationships that can guide genomic selection, candidate gene selection, pathway enrichment, and gene-editing prioritization.


The race to make the perfect baby is creating an ethical mess

MIT Technology Review

A new field of science claims to be able to predict aesthetic traits, intelligence, and even moral character in embryos. Is this the next step in human evolution or something more dangerous? Consider, if you will, the translucent blob in the eye of a microscope: a human blastocyst, the biological specimen that emerges just five days or so after a fateful encounter between egg and sperm. This bundle of cells, about the size of a grain of sand pulled from a powdery white Caribbean beach, contains the coiled potential of a future life: 46 chromosomes, thousands of genes, and roughly six billion base pairs of DNA--an instruction manual to assemble a one-of-a-kind human. Now imagine a laser pulse snipping a hole in the blastocyst's outermost shell so a handful of cells can be suctioned up by a microscopic pipette. This is the moment, thanks to advances in genetic sequencing technology, when it becomes possible to read virtually that entire instruction manual. An emerging field of science seeks to use the analysis pulled from that procedure to predict what kind of a person that embryo might become. Some parents turn to these tests to avoid passing on devastating genetic disorders that run in their families. A much smaller group, driven by dreams of Ivy League diplomas or attractive, well-behaved offspring, are willing to pay tens of thousands of dollars to optimize for intelligence, appearance, and personality. Some of the most eager early boosters of this technology are members of the Silicon Valley elite, including tech billionaires like Elon Musk, Peter Thiel, and Coinbase CEO Brian Armstrong. Embryo selection is less like a build-a-baby workshop and more akin to a store where parents can shop for their future children from several available models--complete with stat cards. But customers of the companies emerging to provide it to the public may not be getting what they're paying for. Genetics experts have been highlighting the potential deficiencies of this testing for years.


LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize performance in various environments. There is increasing interest in using machine learning (ML) and deep learning (DL) algorithms for genotype-to-phenotype prediction due to their excellence in capturing complex interactions within large, high-dimensional datasets. In this work, we propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation, which could potentially help optimize yields and management practices. Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets and enhancing crop phenotype prediction performance.


Graph Machine Learning in Genomic Prediction - KDnuggets

#artificialintelligence

Deep learning is widely known for its flexibility and the capability to uncover complex patterns in large datasets; with these advantages, instances of deep learning in the genomics domain are emerging. One such application is genomic prediction, where the traits of individuals -- like susceptibility to disease or yield-related traits -- are predicted using their genomic information. Understanding the correlation of the genetic traits and variations in genomes could have many benefits such as advancing crop breeding processes, and hence improve food security. In this article, we explore how genetic relationships can be exploited alongside genomic information to predict genetic traits, with the aid of graph machine learning algorithms. In genomic prediction, traditional deep learning would use an individual's genomic information -- like a single nucleotide polymorphism (SNP) -- as input features to the neural network. A SNP is essentially a difference that occurs at a specific position in an individual's genome.


Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer

#artificialintelligence

We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range 0.58–0.71 Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of polygenic score, or PGS) with 3–8 times higher risk than typical individuals.


What If an Algorithm Could Predict Your Unborn Child's Intelligence?

#artificialintelligence

For years, hopeful parents pursuing in vitro fertilization (IVF) treatment have had the option of screening embryos for severe heritable diseases like cystic fibrosis, hemophilia, and Tay-Sachs disease. These rare and often deadly conditions, known as monogenic disorders, can be easily identified through genetic screening because they arise due to a mutation on a single gene. For doctors, diagnosis is a simple positive or negative. But the diseases that are most likely to shadow the average person's life -- cancer, heart disease, diabetes -- are polygenic, meaning that they result from interactions between thousands of genetic signals. In the past, this has made these diseases -- which kill millions of Americans each year -- all but impossible to screen for with genetic tests. But Genomic Prediction, a New Jersey-based company that analyzes genetic data using machine learning, is hoping to change that.


Eugenics 2.0: We're at the dawn of choosing embryos by health, height, and more

#artificialintelligence

Nathan Treff was diagnosed with type 1 diabetes at 24. It's a disease that runs in families, but it has complex causes. More than one gene is involved. And the environment plays a role too. So you don't know who will get it.


Eugenics 2.0: We're at the Dawn of Choosing Embryos by Health, Height, and More

MIT Technology Review

Nathan Treff was diagnosed with type 1 diabetes at 24. It's a disease that runs in families, but it has complex causes. More than one gene is involved. And the environment plays a role too. So you don't know who will get it. Treff's grandfather had it, and lost a leg.